The following is a brief linguistic analysis of the use of racially charged language in William Faulkner’s Absalom, Absalom!. Faulkner’s representation of race was complicated, just as his own his relationship with race was complex. As a Southern white moderate, he voiced his anguish over the dehumanization of African Americans under Jim Crow segregation, and, at the same, time could also casually refer to people as “niggers” during the public retelling of a comic story. Indeed, there is no shortage of literature on Faulkner and race in general, and with regards to Absalom, Absalom! in particular. Given this extensive critical history, it almost goes without saying that a computational analysis of word choice, especially with regard to racially charged language, cannot due justice to the complexities and nuances of either the text or Faulkner’s broader critical intervention. Nevertheless, using techniques common in corpus linguistics (CL) it is possible to give a birds-eye view of how the use of certain words is patterned. This pattern can then, in turn, inform subsequent close readings.
The following piece uses several techniques available to standard CL analysis, and one more complex analysis that is exclusively available to practitioners who have access to the Digital Yoknapatawpha data set. These different techniques have been split into their own sections.
All of the data was generated using the R programming language using the tidyverse suite of packages for the calculations and the plotly library for the graphics. The full repository is available at https://github.com/joostburgers/absalom_sentiment_analysis Due to copyright issues the repository does not include the Absalom, Absalom text file used for data analysis.
With any textual analysis, some pre-processing is required. The steps that follow are standard procedures in CL. The text of Absalom, Absalom! was read in as a txt file. It was then broken into nine chapters, and further sub-setted into sentences. The individual words were subsequently “tokenized.” The process of tokenization removes capital letters, special characters, and punctuation. It enables the computer to compare words more easily. Each “stop word” was then removed. These are words like: the, a, on, at, etc. that are very frequent with in any text, and do not add to the analysis. The words were then lemmatized. Lemmatization reduces a word to the word stem. For example, Negroes becomes Negro. This way all instances of the concept “Negro” are unified as one instance. This prevents creating separate counts for words like Negro, Negroes, and Negro’s.
The resulting slate of words was tagged as racially charged by adding a column called race_word and indicating TRUE or FALSE for each word. This was done by creating a list of racial words and joining it to the data table through a left sided join. Essentially, it checks to see if a word like “Negro”, “White”, or “Octoroon” occurs and tags it as TRUE. Such a list of racial words is necessarily imperfect as the words “black” and “white” could also denote colors and not racial designations. Still, with this pre-processing complete it is possible to provide some key statistical insights.
The chart below shows the ten most frequent non-racial words and racial words in the text. Hovering over the the individual bars reveals their precise number, and clicking on TRUE and FALSE turns that particular series on and off.
What is immediately noticeable is that the word “nigger” is the most frequent racial term. It exceeds the word “negro” by 50 counts. It occurs about a third as infrequently as the word Henry (the main character) and twice as infrequently as the racially ambiguous Charles Bon. Importantly, the occurrences of the individual names of characters is not the same as the number of times they actually occur in the text. After all, the pronouns “he” or “she” could equally well denote a character, but that is not shown here.
Collocation is a process of determining what words appear together. This is done by creating n-grams, where n is the number of words that might match in a sequence. By determining the n-gram around particular words, we can get a better sense of the context. For example, in her research of British Newspapers, Dawn Archer has shown that the most common bigram (n-gram of two) for Muslim is “Muslim terrorist.”(CITE) Certainly this strong association between these two words indicates how Muslim’s are represented in the British media. In similar fashion, we get a better sense of how Faulkner is using racial language by looking at the words immediately before and after them.
The phrase that stands out the most is one that Rosa Coldfield uses early on “wild niggers.” It becomes a leitmotif for much of the text and the phrase will be repeated throughout. Yet, who repeats it and how it is repeated will change.
In their use of either “wild niggers” or “wild negro,” Quentin and Rosa Coldfield share an inverse relationship. This is curious because it is Rosa who first uses the phrase when referring to the demonic Sutpen arriving in Yoknapatawpha:
Out of quiet thunderclap he would abrupt (man-horse-demon) upon a scene peaceful and decorous as a schoolprize water color, faint sulphur-reek still in hair clothes and beard, with grouped behind him his band of wild niggers like beasts half tamed to walk upright like men, in attitudes wild and reposed, and manacled among them the French architect with his air grim, haggard, and taller-ran.
It is initial instance of the phrase uttered by Rosa that is carried forward throughout the text. It is therefore interesting that Quentin takes this note and appears to repeat it throughout the text. What’s more, Rosa’s initial association between enslavement and wildness is one that will echo throughout the text. This, despite the fact, that she says it only once.
We can also look at the word frequency data temporally by casting it across the chapters. This indicates the frequency of a particular word in each chapter. It may be that some racial words are used in one part of the book and not in others. This gives some indication as to its value in the narrative.
It is clear that chapter 7 is particularly racially charged. While certain narrators predominate in certain chapters, it would be a mistake to attribute particular words to particular characters based on this raw data. We may recall that chapter 7 is a nested narration in which we are told the story of Thomas Sutpen as related it to General Compson whot told it to Mr. Compson who told it to Quentin who is telling it to Shreve. There are so many narrative frames that would make it very difficult to determine whose language this is. What is apparent, is that the chapter in which most of Sutpen’s life is revealed is steeped in pejorative racist language. To be sure, in all the other chapters the word negro or black is used more frequently to describe African Americans.
Sentiment analysis is a field of CL that tries to establish the emotional valence of a segment of text. It does so through sentiment libraries. These are words that have been hand coded to indicate certain emotions like: joy, sadness, surprise, or, more broadly, positive and negative. In general, sentiment libraries are used for analyzing social media or large data sets where the narrative data tends to be less complex and operates at scale. Thus, while the sentiment dictionary might not match each sentiment exactly, in the aggregate the predominant emotion rises to the top.
For literary works, sentiment analysis is far more speculative and merits quite some caution. Without a specially trained dictionary for a specific corpus, sentiment analysis can reveal certain patterns around words, but it is unclear what the margin of error might be. There are, so to speak, unknown unknowns. This is particularly true of Faulkner who uses many words that are emotionally charged that might not make their way into a sentiment library, or who uses words like “unamaze” to negate a particular emotion, in this case surprise. Any results that sentiment analysis generates should therefore be seen as a prompt into further inquiry and not a final result.
One of the most basic ways to think through sentiment are the positive and negative sentiments across a text. The basic procedure is to tag each positive and negative sentiment in a text and then tabulate these chunks by some logical unit, be it a sentence, paragraph, or chapter. This will give you the total sentiment of that particular unit. Since, we are interested in the emotion surrounding racial words, it makes the most sense to set the unit boundary at the sentence level. This produces a very granular chart, but for Absalom, Absalom! this granularity is very revealing.
One of the immediate things that stands out about this chart is just how negatively charged sentences in Absalom, Absalom! are. There are very few positive sentences in this text. The sentences that contain racial words are predominately negative. In fact, the sentence with the most negative emotions attached to it is also racially charged. This is sentence 1421 which, at 969 words, is also one of the longest sentences in the text. If you do not know Absalom, Absalom! by sentence, and I hope you don’t, this is the passage that speaks of Sutpen’s dissolution in the wake of the Civil War, and his drunken parleys with Wash Jones. The reason for the overabundance of negative emotions is both the sentence length and its grotesque content.
Understanding when a certain word is used and in what emotional context does not necessarily indicate who is using it. There is currently no way to determine who is speaking in Absalom, Absalom! This is both a practical computational issue in terms of matching speaker with dialogue, but, more philosophically, we may also wonder if anyone’s language is truly their own in the text. This is a community that has been shaped by the same story for generations. The cadence, register, and tone all inform particular leitmotifs that occur and reoccur throughout the narrative. Indeed, one of the interesting phenomena that CL reveals is just how often certain turns of phrase are repeated, re-worked, and re-contextualized. The singularity of the speaker is unsettled by the multiplicity of the spoken.
That being said, it is possible to investigate the proximity of racial words relative to characters. The Digital Yoknapatawpha database breaks down a text into events, which are, in turn, composed of locations and characters. By cross-referencing the words with the events, we can get some notion of what words are being used around what characters. This re-composition of the text drops out certain sentences because, on occasion, the event length is very short. In total, this was the case for 5 events, or .7% of the entire text. The difference is therefore negligible.
While every character can be matched with the number of each word that occurs with them, this is not a relevant statistic. The words that make the most sense are the five most frequent race words discussed in figure BLANK. These are “blood,” “black,” “negro,” “nigger,” and “white.” As each character necessarily occurs with each word at a different frequency, the top five characters were selected by the number of appearances in the total number of the events.
The resulting chart is quite revealing. Among most of the characters, the ratio of the word Negro vs. the N-word is relatively even. The most obvious difference is Thomas Sutpen. In events where is he present or mentioned the n-word occurs 132 times. Part of the reason for this is that Thomas Sutpen occurs in the most events throughout the text, 320 to be exact. Consequently, it makes sense that he has a higher chance of occurring in those events in which there is a particular racial word.
In order to get a better view into how often a racial word is used in the same event as a character, we need to normalize the data by the number of times the word occurs. This way we can understand proportionally, how often a character is in an event where a particular word occurs. We know from the previous chart that the n-word occurs 152 times. Dividing the number of occurrences for each character by this number produces the percentage chart below.
The chart reveals, quite dramatically, that Thomas Sutpen is in some way part of the event 86.8421053% of the times that the n-word occurs. While it is not clear that he is using the word, it is clear that he is the character with whom the racial epithet is most associated. Indeed, in one event it occurs 16 times. This is when Sutpen is barred from entering the front door of the plantation by the enslaved butler (187). It is this primal incident that shapes so much of Sutpen’s consciousness going forward, and it becomes the gravitational center that pulls in the worst racial language American literature has to offer at the moment that he becomes aware of his class difference it redounds to racial antagonism.
The use of racial language is also time bound. Certain words are more prominent during certain periods than others. The DY data also includes speculative dates for each event. These dates consist of both an earliest possible start date and latest possible end date. Needless to say, establishing Faulkner’s chronology is not an exact science and the dates are best seen as an approximate measure. Nevertheless, they do give a general indication around what time the events take place.
To highlight the trajectory of the n-word, the opacity of the other two words has been turned down a bit. What is immediately visible is that the n-word is the most used word in the 1830s and 1860s. These are two great chapters in the Sutpen saga: the establishment of a racial enslavement regime and its dissolution. During Reconstruction, the words Black and Negro are used more frequently, albeit only a very few number of times. Interestingly, the lines bifurcate from 1900 to 1910. This is because the year 1909 is grouped with 1900 and not 1910. Likely what is visible is the difference in the usage of the n-word by Rosa Coldfield and Shreve and Quentin.
To be continued…